posterior concentration
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Bayesian Dyadic Trees and Histograms for Regression
Stéphanie van der Pas, Veronika Rockova
Many machine learning tools for regression are based on recursive partitioning of the covariate space into smaller regions, where the regression function can be estimated locally. Among these, regression trees and their ensembles have demonstrated impressive empirical performance. In this work, we shed light on the machinery behind Bayesian variants of these methods.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Netherlands > South Holland > Leiden (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Middle East > Israel (0.04)
Posterior Concentration for Sparse Deep Learning
We introduce Spike-and-Slab Deep Learning (SS-DL), a fully Bayesian alternative to dropout for improving generalizability of deep ReLU networks. This new type of regularization enables provable recovery of smooth input-output maps with {\sl unknown} levels of smoothness. Indeed, we show that the posterior distribution concentrates at the near minimax rate for alpha-Holder smooth maps, performing as well as if we knew the smoothness level alpha ahead of time.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models
Explicit noise-level conditioning is widely regarded as essential for the effective operation of Graph Diffusion Models (GDMs). In this work, we challenge this assumption by investigating whether denoisers can implicitly infer noise levels directly from corrupted graph structures, potentially eliminating the need for explicit noise conditioning. To this end, we develop a theoretical framework centered on Bernoulli edge-flip corruptions and extend it to encompass more complex scenarios involving coupled structure-attribute noise. Extensive empirical evaluations on both synthetic and real-world graph datasets, using models such as GDSS and DiGress, provide strong support for our theoretical findings. Notably, unconditional GDMs achieve performance comparable or superior to their conditioned counterparts, while also offering reductions in parameters (4-6%) and computation time (8-10%). Our results suggest that the high-dimensional nature of graph data itself often encodes sufficient information for the denoising process, opening avenues for simpler, more efficient GDM architectures.
- North America > United States > California > Yolo County > Davis (0.14)
- North America > United States > California > Orange County > Irvine (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Review for NeurIPS paper: Consistent feature selection for analytic deep neural networks
Additional Feedback: Below please find my detailed comments: 1. This paper do not consider the feature selection under high dimensional setting. This paper consider an easier problem setting in which the total number of feature is fixed and does not scale with the number of data points. It is important to study a feature selection method under high dimensional setting, where the number of total feature may grows exponentially fast w.r.t. However, there are various previous actually already gives selection consistency result, for example [1,2,3]. And the selection consistency in [4, 5] can be easily obtained with several simple arguments built upon their analysis.
Posterior Concentration for Sparse Deep Learning
We introduce Spike-and-Slab Deep Learning (SS-DL), a fully Bayesian alternative to dropout for improving generalizability of deep ReLU networks. This new type of regularization enables provable recovery of smooth input-output maps with {\sl unknown} levels of smoothness. Indeed, we show that the posterior distribution concentrates at the near minimax rate for alpha-Holder smooth maps, performing as well as if we knew the smoothness level alpha ahead of time. These network attributes typically depend on unknown smoothness in order to be optimal. We obviate this constraint with the fully Bayes construction.
Reviews: Bayesian Dyadic Trees and Histograms for Regression
This paper analyses concentration rates (speed of posterior concentration) for Bayesian regression histograms and demonstrates that under certain conditions and priors, the posterior distribution concentrates around the true step regression function at the minimax rate. Different approximating functions are considered, starting from the set of step functions supported on equally sized intervals, up to more flexible functions supported on balanced partitions. The most important part of the paper is building the prior on the space of approximating functions. The paper is relatively clear and brings up an interesting first theoretical result regarding speed of posterior concentration for Bayesian regression histograms. The authors assume very simple conditions (one predictor, piecewise-constant functions), but this is necessary in order to get a first analysis.